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The internet of things (IoT) describes the network of physical objects equipped with 
sensors and other technologies to exchange data with other devices over the Internet. 
Due to its inherent flexibility, field-programmable gate array (FPGA) has become a 
viable platform for IoT development. However, various security threats such as FPGA 
bitstream cloning and intellectual property (IP) piracy have become a major concern 
for this device. Physical unclonable function (PUF) is a promising hardware finger- 
printing technology to solve the above problems. Several PUFs have been proposed, 
including the implementation of reconfigurable-XOR PUF (R-XOR PUF) and multi- 
PUF (MPUF) on the FPGA. However, these proposed PUFs have drawbacks, such as 
high delay imbalances caused by routing constraints. Therefore, in this study, we ex- 
plore relative placement method to implement the symmetric routing in the obfuscated 
delay-based PUF on the FPGA board. The delay analysis result proves that our method 
to implement the symmetric routing was successful. Therefore, our work has achieved 
good PUF quality with uniqueness of 48.75%, reliability of 99.99%, and uniformity 
of 52.5%. Moreover, by using the obfuscation method, which is an Arbiter-PUF com- 
bined with a random challenge permutation technique, we reduced the vulnerability of 
Arbiter-PUF against machine learning attacks to 44.50%. 
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1. INTRODUCTION 


The internet of things (IoT) is an ecosystem of networked physical objects accessible via the Internet. 
It is an embedded technology that allows devices to communicate with each other [T]. It also allows a device 
to monitor and understand a scenario or environment without human assistance. With the increasing adoption 
of wireless fidelity (Wi-Fi) and four-generation (4G) long term evolution (LTE) wireless internet access, the 
evolution towards ubiquitous information and communication networks is already apparent [2]. The IoT can 
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also be described as a data exchange environment where devices are connected to wired and cellular networks 
B]. IoT applications exist in various fields such as smart cities, healthcare, smart homes, and industrial se- 
curity. For example, with an IoT-based remote monitoring system, it becomes easier to monitor the overall 
performance of the system through a web-based approach rather than using an on-site monitoring meter [4]. 
According to Bao et al. [5], the field-programmable gate array (FPGA) has proven to be a viable platform for 
IoT development due to its inherent flexibility and reconfigurability. 

FPGA is an integrated circuit that combines internal hardware blocks with user-programmable inter- 
connects to customize operation to a particular application. The interconnects can be easily reprogrammed, 
allowing an FPGA to adapt to design changes. FPGAs evolved from earlier devices such as programmable 
read-only memories (PROMs) and programmable logic devices (PLDs). While these devices could be pro- 
grammed at the factory or in the field, they were based on fuse technology and could not be modified once 
programmed. In contrast, FPGAs store their configuration data in a reprogrammable medium such as static 
random access memory (SRAM) or flash memory. A typical FPGA design consists of thousands of basic ele- 
ments called configurable logic blocks (CLBs) surrounded by a system of programmable interconnects known 
as a fabric that routes signals between the CLBs. The FPGA and external devices are interconnected via in- 
put/output (I/O) blocks. Like any other technology, FPGAs must be protected against several security threats, 
such as cloning of the FPGA bitstream or piracy of the core intellectual property (IP). 

Physically unclonable function (PUF) was introduced to address these shortcomings by exploiting the 
inherent physical properties of a device. PUF is a function based on a physical system that is easy to evaluate. It 
is not clonable or reproducible on another copy of the same physical system, even if the functionality is known. 
These advantages of PUFs are due to the process variations in each chip caused by the manufacturing process. 
Several implementations of PUFs design for security applications in FPGAs include intellectual property pro- 
tection, secure key generation, and cloud security [6]-(9]. However, the previous PUFs designed on FPGAs 
have drawbacks, such as delay imbalances caused by routing constraints. Therefore, in this study, we propose 
relative placement method to implement the symmetrical routing in the obfuscated delay-based PUF on the 
digilent Nexys-4 Artix-7 FPGA board. The obfuscated delay-based PUF is an Arbiter-PUF combined with a 
random challenge permutation technique to reduce the vulnerability of the Arbiter-PUF against machine learn- 
ing attacks (ML-attack). We prove that using relative placement method to implement the delay-based PUF in 
FPGA can eliminate the biased response and achieved great PUF quality which include uniqueness, reliability, 
and uniformity. Moreover, we show that the random challenge permutation method can be implemented using 
routing obfuscation, hence achieve low area overhead. 

This work begins with a detailed introduction to related work on PUFs in section [2.] Subsequently, 
the PUF design and implementation are described in section [3.] Followed by the method used to construct the 
architecture of the PUF in section [4.] PUF performance analyses are presented in section [5.] Conclusion are 
drawn in section[6.] 


2. RELATED WORK 

Majzoobi et al. proposed an Arbiter-PUF (APUF) structure for the FPGA platform based on 
programmable delay logic (PDL). The internal structure of the look up tables (LUTs) is used to construct a 
PDL. Two PDL are combined to form a path swapping switch. However, this technique introduces a bias at the 
beginning and end of the delay paths. Sahoo et al. has proposed an improvement for the PDL technique. 
The authors use a PDL chain to construct the upper and lower delay paths independently by using the hard 
macro function of the Xilinx computer aided design (CAD) tool, and then instantiated the hard macro twice for 
the asymmetric implementation of the delay paths. In contrast, creating a hard macro of a long PDL chain is a 
time-consuming process that limits the flexibility of the design. 

In another study, Habib et al. implement a new ring oscillator (RO) design on FPGA. In this 
design, the internal variations of the FPGA LUTs are exploited to generate a PUF response. The proposed PUF 
design on FPGA achieves great uniqueness, reliability and uniformity but the PUF suffers from a continuous 
dynamic power dissipation due to the oscillation. Elsewhere, Dan et al. implemented a reconfigurable 
XOR PUF (R-XOR PUF) on the FPGA. The R-XOR PUF consists of multiplexers and inverters. The response 
of R-XOR PUFs is generated by XORing the two responses. Nevertheless, this method provides a low evalu- 
ation metric in terms of uniqueness and reliability. In our study, we explore an efficient method to implement 
the symmetrical routing in obfuscated delay-based PUF on FPGA without compromising the PUF performance 
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such as uniqueness, reliability, and uniformity. Moreover, we show that the random challenge permutation in 
obfuscated delay-based PUF can be implemented simply by routing obfuscation which introduces low area 
overhead. 


3. DESIGN OF PHYSICAL UNCLONABLE FUNCTION 


The obfuscated delay-based PUF is an Arbiter-PUF combined with a random challenge permutation 
technique. Arbiter-PUF was first introduced by Lee et al. [14], which is based on race conditions in integrated 
circuits (ICs). Although it is considered as one of the strong PUFs, an adversary can easily predict the response 
of Arbiter-PUF by using a linear additive model using machine learning techniques. To overcome this problem, 
random challenge permutation technique was introduced by to reduce the vulnerability of Arbiter-PUF 
against ML-attack. 


A critical factor to be emphasized when implementing this PUF design in FPGA is symmetric routing. 
Based on Figure[i] the CLB element in Xilinx Artix-7 contains a pair of slices, and each slice consists of four 
LUT with 6 inputs. Relative placement is used to ensure balanced routing for the PUF and to prevent delay 
imbalance. In relative placement, each stage of the circuit component has been implemented in a slice with two 
LUTs. The LUTs for the upper path are located at the position of the upper LUTs in the slice, and the LUTs 
for the lower path are located at the position of the lower LUTs in the slice. The routing of the path starts from 
slices XOY1 to X31Y1, where ”X” followed by a number indicates the position of each slice in a pair as well 
as the column position of the slice, while a ” Y” followed by a number indicates a row of slices. Tool command 
language (TCL) script is used to automate the process of relative placement method. 


CLB 


Look-up 
Table (LUTs) 


Slices 


Figure 1. Xilinx Artix-7 fabric 


Based on Figure[2| the obfuscated delay-based PUF was designed with 32 stages of switching compo- 
nents and a D flip-flop as the arbiter on Artix-7. A microblaze processor block diagram was designed to write 
and read the challenge-response pairs (CRPs). The architecture of the processor consists of general purpose 
input output (GPIO) 0 and 1. These GPIOs were used to send pulses, challenge bits and fetch the response from 
the PUF module. The Artix-7 board has a built-in 100 MHz oscillator that is used as a clock input to the FPGA 
and to communicate with the PC via a 115,200 bps universal asynchronous receiver transmitter (UART). 
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N-Switching Stages 


Figure 2. System functional block diagram 


4. METHODOLOGY 

Xilinx vivado 2018.3 is used to synthesize and implement the obfuscated delay-based PUF in digilent 
Nexys-4 Artix-7 FPGA board. For PUF performance evaluation, sufficient number of PUFs must be designed 
to obtain statistically significant results. For cost reasons and following the method described in [16], five 
obfuscated delay-based PUFs were implemented in an FPGA, each on a different set of slices. We assume that 
each PUF design represents an FPGA device. The routing of the path starts from slices XOY150 to X31Y150 
for PUF 1, XOY151 to X31Y151 for PUF 2, XOY152 to X31Y152 for PUF 3, XOY153 to X31Y153 for PUF 
4, and XOY154 to X31Y154 for PUF 5. A 32-bit response is generated for each of the PUF instances. The 
placement constraints in vivado were achieved using TCL scripts. A microblaze soft processor core with PUF 
architectural design was designed and implemented for writing and reading the CRPs. Also, an integrated logic 
analyzer (ILA) was connected to MicroBlaze to display the discrete waveform of the pulse, the challenge bit, 
and the collected response in the hardware manager of Xilinx vivado. The MicroBlaze program was written in 
the C language in the Xilinx software development kit (SDK) for simulation purposes. The Tera term software 
is connected to the MicroBlaze processor through a serial data interface to collect CRPs. Subsequently, the 
collected CRPs are analyzed using MATLAB to evaluate the uniqueness, reliability and uniformity. For the 
ML-attack evaluation, 32000 CRPs are applied to the PUF, where each challenge generates one bit of response. 
Following the method described in [15], the artificial neural network (ANN) technique is used to evaluate the 
resiliency of the PUF against ML-attack. 


5. SIMULATION RESULTS AND ANALYSIS 
5.1. Physical unclonable function implementation 


Figure[3|shows the implementation design for the switching component. As can be observed inside the 
switching component, there are two multiplexers implemented by LUTs. Each LUTs has three inputs, namely 
10, I1, and I2. The input of the challenge bit (11) controls the path inside the LUTs. When the challenge bit 
is ’0’, the pulse signal is routed through (10). When the challenge bit is ’1’, the pulses cross and pass through 
(12). Each switching component has its own challenge bit. In our work, there are 32 switching components, 
and their challenge bits are permutated randomly. The last switching component is connected to D flip-flop, 
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which is the arbiter block. The upper path of the last switching component is connected to the data input, and 
the lower path is connected to the clock of D flip-flop. 

Figure /4]illustrates the MicroBlaze processor block diagram connection with its slave module. Each 
slave has its own address to communicate with the processor through the peripheral module. As can be ob- 
served, the Arbiter-PUF module has two inputs and one output, namely ipulse, ichallenge, and oresponse. The 
32-bit challenge was generated by GPIO 0 and connected to the ichallenge of the PUF. The pulse from GPIO 1 
(channel 1) is connected to ipulse of the PUF. Any response generated by the PUF is fetched from the oresponse 
output of the PUF through GPIO 1 (channel 2). This entire process was controlled by the UART module set up 
in the MicroBlaze processor to collect the CRPs. 


inst_dff 


inst_delay_line_31 


dff_primitive.FDRE_inst_0 
inst_mux_1 


oout_INST_O system_arbiter_puf_0_0_dff 


te 


system_arbiter_puf_0 0 _mux__32 


inst_mux_2 


system_arbiter_puf_O0_0 mux 


Figure 3. Switching component implementation 
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Figure 4. MicroBlaze processor block diagram 


Secure lightweight obfuscated delay-based physical unclonable function design ... (Mohammad Haziq Ishak) 


1080 g ISSN: 2302-9285 


Figure |5]shows the complete balance routing map for a switching component. It can be seen that the 
routing of the upper and lower paths is symmetrically designed, which guarantees an un-biased PUF response 
and proves that our relative placement method for the PUF design is successfully created. Table [I] shows the 
total accumulated delay difference between the upper and lower paths for 5 different PUF placements in the 
FPGA. According to Pundir et al. [17], the A-PUF implementation on the FPGA is challenging because the 
two paths must be symmetric and similar, so the difference in the generated response is due to variations in the 
fabrication process rather than architectural delays in the design. It can be observed from Table[I] that for each 
PUF placement, the delay difference between the upper and lower paths is small and random, which proves 
that the implementation of the manual routing technique used in this study has been successfully created in the 
FPGA. 


= | Upper LUTs 
| Upper LUTs Path 
—l 


Lower LUTs 


Path \ 


~ | 
| Lower LUTs 


Figure 5. Upper and lower routing path 


Table 1. Total accumulated delay for upper and lower paths 
PUF Upper (ps) Lower (ps) 


PUF 1 18601 18146 
PUF 2 17906 17927 
PUF 3 18162 18144 
PUF 4 17719 17667 
PUF 5 18124 18146 


5.2. Performance metrics 

A 32-bit response is generated for each PUF instances using the method described in seci [H The 
response is used to quantify the PUF performance using standard evaluation metrics. As described in |18], three 
metrics to evaluate the PUF performance are uniqueness, reliability, and uniformity. Uniqueness is the ability of 
a PUF to be uniquely distinguished from a group of PUFs of a similar type. Meanwhile, reliability defines the 
ability of the PUF to produce the same responses when applied with the same challenges over the temperature 
and supply voltage fluctuations. Uniformity defines the proportion of 0’s and 1’s in a PUF response. The 
number of 0’s and 1’s in a PUF response must be balanced, hence an ideal uniformity is distributed at 50%. 

Table [2] compares the performance of the obfuscated delay-based PUF design with the previously 
proposed PUFs. As can be seen in Table[2| PCS-PUF has the highest uniqueness with 49.80% and the M-PUF 
technique has the lowest with 40.60%. Nevertheless, our approach also provides higher uniqueness which is 
48.75%. For reliability, 10 sets of a hundred 32-bit responses of the obfuscated delay-based PUF are collected 
under a nominal condition of 1.1 V supply voltage and 24 °C. The first set of responses is used as a reference 
in which the other nine sets are compared. The results show that our design achieves close to an ideal value 
of 100% compared to the other PUFs. The response generated by the PUF is stable even though the CRPs 
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collection process was repeated 9 times. In terms of uniformity, RO-PUF has the highest uniformity with 
50.13%, followed by PCS-PUF with 49.77%. The uniformity of our approach is 52.5%, which is slightly 
higher than the ideal value. Our work achieves efficient resource utilisation with 64 LUTs and 32 slices in 
hardware resource consumption. From Table [2] we can conclude that our approach has achieved good result in 
PUF quality metrics and the area overhead in the FPGA. 


Table 2. A comparison of hardware resource consumption and metrics of different PUF designs 


PUF design Uniqueness (%) Reliability (%) Uniformity (%) FPGA Area overhead Predictability (%) 
PCS 49.80 98.19 49.77 Spartan 3E 388 LUTs, 196 slices - 
RO-PUF - - - Spartan-3E - 85.20 
PDL 45.25 97.12 50.34 Spartan-3E - - 
MISR 49.30 99.80 - Xilinx ZC706 380 LUTs, 128 FF 

MPUF 40.60 - 37.03 Xilinx Artix 7 - - 
Lattice PUF 50.00 = 49.98 Spartan-6 - 50.00 
APUF 44.3 96.00 48.45 Spartan-6 234 slices - 
R-XOR 40.00 - - Virtex 5 268 LUTs - 
Rec-DAPUF - 94.80 - Xilinx Artix 7 64 LUTs, 55 slices 64.90 
Our approach 48.75 99.99 52.5 Xilinx Artix 7 64 LUTs, 32 slices 44.50 


5.3. Machine learning-attacks 


Another critical criterion for a PUF is its resistance to ML-attack. In a previous study, Hospodar et al. 
proves that the ANN technique is better in predicting the response of Arbiter-PUF and XOR Arbiter-PUF 
compares to the support vector machine (SVM). In our study, the neural network classifier was optimized by 
using the trial and error method [27]. The trial and error procedure has found that the best prediction accuracy 
can be achieved by using two hidden layers, five neurons per layer, and the “logsig” activation function. A 
resilient back propagation has been chosen for the training algorithm of the neural network classifier as it is 
fast and more accurate than the other training algorithms. 30,000 CRPs were used for the training data set and 
the remaining 2,000 CRPs were used as the testing data set. 

Based on Table B} it can be seen that lattice PUF method successfully reduces the predictability of re- 
sponse against ML-attack with an ideal value of 50%, followed by Rec-DAPUF and RO-PUF with 64.90% and 
85.20%, respectively. In all the mentioned works, the same method was used to predict the response of Arbiter- 
PUF as in this study which is ANN. Although the resistance to ML-attack was reduced, the area overhead used 
in each method is high due to the complexity of the design. By implementing our design, the predictability 
can be reduced to 44.50%. Our random challenge permutation technique to reduce the susceptibility of an 
Arbiter-PUF against ML-attack consumes low area overhead as it can be implemented by routing obfuscation. 
Therefore, it is suitable for lightweight security devices. 


6. CONCLUSION 

In this paper, the obfuscated delay-based PUF has been implemented in digilent Nexys-4 Artix-7 
FPGA board. The proposed PUF has been implemented by balancing and constraining the routing using 
TCL script. The delay analysis results show that the symmetric delay path was achieved by using manual 
routing for placing the switching components and the arbiter block on the FPGA. As for the evaluation of PUF 
performance metrics, the proposed PUF achieved uniqueness of 48.75%, reliability of 99.99%, and uniformity 
of 52.5%. These results show that the obfuscated delay-based PUF achieves good PUF quality. Moreover, we 
have also shown that the obfuscated delay-based PUF reduces the susceptibility of the conventional Arbiter- 
PUF to ML -attack from 98% to 44.50% without requiring additional complex circuitry on the FPGA, which 
would increase the area consumption. These findings indicate that our proposed PUF design is suitable for 
lightweight identification and authentication applications in resource-constrained IoT devices. 
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